Skip to main content

View Post [edit]

Poster: onelson Date: Nov 5, 2016 1:10pm
Forum: texts Subject: Possible to search full text of a subset of books? (e.g. all by a publisher)

I am happy with the new ability to search within the text of books in the text archive, but I'm trying to figure out how to make it useful.
Question 1: If I click on an author or publisher name in a book's details page, which takes me to a results page showing all the texts that share that author/publisher/whatever, is there any way to search within the full text of just those results for a particular term?
Example:
I find an issue of Louisiana Conservationist (https://archive.org/details/louisianaconserv2456depa)
I think other issues of the same publication are likely to be useful.
So I click on the byline below the title (https://archive.org/search.php?query=creator%3A%22Department+of+Conservation%2C+State+of+Louisiana%22)
The following is in the search box:
creator:"Department of Conservation, State of Louisiana"

I've tried various ways to add a term to that search so I could search for only those issues that have a particular word, such as "gar". Everything I've tried has come up with no results, probably because "Department of Conservation, State of Louisiana" never appears inside the magazine.

I know in this case I can click on the topic link on the left that fits the search, getting me the 69 issues that might be relevant.

But what if I want to search for more than one term in the whole text archive? If I try to put 2 terms in a full-text search, even when not trying to limit it to a particular publication, the number of results is the same no matter what I change the second term to.
garfish gets 3725 results
garfish zeppelin gets 3725 results
garfish AND zeppelin AND black AND sabbath gets 3725 results
Garfish AND "black sabbath" gets 3725 results
Garfish OR "black sabbath" gets 3725 results
Also tried it with various symbols, slashes, parentheses, etc. Some things break the search, but nothing seems to narrow it.

Finally, the full text search is not integrated with the advanced search, is it?

Thanks.
Olaf

Reply [edit]

Poster: onelson Date: Dec 3, 2017 2:02pm
Forum: texts Subject: Re: Possible to search full text of a subset of books? (e.g. all by a publisher)

It's been a little over a year since I first posted this, and as far as I can tell there have been no changes that make searches work better. With millions of books in the archive, it would be nice to be able to search them effectively.
Can anyone in the know comment on what, if anything, might be in the works?
Thanks.

Reply [edit]

Poster: jbothma Date: Jan 19, 2017 10:39pm
Forum: texts Subject: Re: Possible to search full text of a subset of books? (e.g. all by a publisher)

I'll second the commendation of the full text archive, well done!

I realise it's in Beta and there's so much that has to be done around ES to provide a human with a reasonable interface, not to mention the challenge of integrating that with your existing search tools (basic search and drilldowns, advanced search syntax and the API). I experienced the above and I think it can come along some other improvements I hope you get to eventually. I might also be doing something wrong making my users' search experience less than ideal so here goes:

I provide a search box at http://mfmamirror.github.io/ which sends the user to the archive.org search using the terms they provided, doing a full text search and limiting it to my collection (mfmasouthafrica).

- As above, I'm not sure the second term is being used in the search - the term [umlazi] has the same number of hits as the term [umlazi audit]
- - https://archive.org/search.php?query=umlazi%20audit&;sin=TXT∧[]=collection%3A%22mfmasouthafrica%22
- - https://archive.org/search.php?query=umlazi&;sin=TXT∧[]=collection%3A%22mfmasouthafrica%22

- The [] in and[] in the query string can't be url-encoded. A web form encodes them automatically but I had to do window.location = ... in javascript without encoding them - am I confused about urlencoding?

- If I'm on a search result restricted to a collection (https://archive.org/search.php?sin=TXT&;query=umlazi%20audit∧[]=collection%3A%22mfmasouthafrica%22) and I change the search term and submit, I lose the restriction to the collection (https://archive.org/search.php?query=umlazi&;sin=TXT)

I think I had an issue with selecting a collection and then losing another restriction of the search, perhaps full text, but that's working fine now so you probably have your own backlog of things you know can be improved.

Well done again!